English-Latvian Toponym Processing: Translation Strategies and Linguistic Patterns
نویسندگان
چکیده
The paper presents a study of a challenging task in machine translation and crosslanguage information retrieval – translation of toponyms. Due to their linguistic and extra-linguistic nature, toponyms deserve a special treatment. The overall translation process includes two stages of processing: dictionary-based and out-ofvocabulary toponym translation. The latter is divided into three steps: source string normalisation, translation, and target string normalisation. The translation process implies an application of translation strategies and linguistic toponym translation patterns. Possible translation strategies, including transliteration and translation per se along with combined strategies, and linguistic toponym translation patterns, including multi-word patterns as well, were investigated and implemented for English-Latvian machine translation. 10,000 The UK-related toponyms from Geonames were selected for a development set. The evaluation of output quality on basis of a test set has showed 67% accuracy in out-ofvocabulary translation: 58% on a set containing one-word toponymic units and 81% on a multi-word test set.
منابع مشابه
Pattern-based English-Latvian Toponym Translation
Due to their linguistic and extra-linguistic nature toponyms deserve a special treatment when they are translated. The paper deals with issues related to automated translation of toponyms from English into Latvian. Translation process allows us to translate not only toponyms from a dictionary, but out-of-vocabulary toponyms as well. Translation of out-of-vocabulary toponyms is divided into thre...
متن کاملTopicalization in English Translation of the Holy Quran: A Comparative Study
The Holy Quran, as an Arabic masterpiece, comprises great domains of syntactical, phonological, and semantic literary patterns. These patterns work as the shackle of translators. This study examined the application of the most common shift strategies in Catford‟s linguistic model for translation of topicalization in chapter 29 of the Holy Quran. The topicalized cases were compared to their coun...
متن کاملToward a Comparable Corpus of Latvian, Russian and English Tweets
Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingual Latvian-Russian-English corpus of tweets from Riga, Latvia is investigated. Such a corpus, once constructed, might be of great use for multiple purposes including training machine translation models, examining cross-lingual phenomena and studying the population of Riga. This pilot study shows that...
متن کاملThe Representation of Non-Linguistic Sounds in Persian and English Subtitles for the Deaf and Hard-of-Hearing: A Comparative Study
Subtitling for the deaf and hard-of-hearing (SDH) is an area which deserves a special attention as it ena- bles these people to access to the part of the ‘world’ intended for hearing people, including the world of ‘motion pictures’, and particularly movie sounds. Compared to linguistic sounds, non-linguistic sounds have received little attention in the field of translation, although they are in...
متن کاملExploring the Translator\'s Solutions to the Translation of Conversational Implicatures from English into Persian: the Case of Tolkien\'s the Lord of the Rings
The present study aimed to examine the translatorchr('39')s solutions to the translation of conversational implicatures from English into Persian. To do so, 120 conversational implicatures were extracted from the novel the Lord of the Rings (Tolkien, 1954) and classified based on Gricechr('39')s (1975) categorization of Maxims, including quality, quantity, relevance, and manner. Mur Duenaschr('...
متن کامل